AITopics | gaussian policy

Collaborating Authors

gaussian policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Decentralized Diffusion Policy Learning for Enhanced Exploration in Cooperative Multi-agent Reinforcement Learning

Zhang, Yuyang, Balim, Haldun, Li, Na

arXiv.org Machine LearningMay-11-2026

Cooperative multi-agent reinforcement learning (MARL) involves complex agent interactions and requires effective exploration strategies. A prominent class of MARL algorithms, decentralized softmax policy gradient (DecSPG), addresses this through energy-based policy updates. In practice, however, such energy-based policies are intractable to maintain and are commonly projected onto the Gaussian policy class. In this work, we show that the limited expressiveness of Gaussian policies severely hinders exploration in DecSPG, and this limitation worsens as the number of agents grows. To address this issue, we propose decentralized diffusion policy learning (DDPL), which parameterizes each agent's policy with a denoising diffusion probabilistic model, an expressive generative model that captures multi-modal action distributions for enhanced exploration. DDPL enables efficient online training of diffusion policies via importance sampling score matching (ISSM), a novel training method with theoretical guarantee. We evaluate DDPL on representative continuous-action MARL benchmarks, including multi-agent particle environment, multi-agent MuJoCo, IsaacLab, and JAX-reimplemented StarCraft multi-agent challenge, and observe consistently improved performance.

algorithm 1, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

2605.07101

Genre: Research Report (0.50)

Industry:

Education > Educational Setting > Online (0.54)
Energy (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.90)

Add feedback

Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization

Neural Information Processing SystemsMar-20-2026, 21:20:40 GMT

Diffusion models have garnered widespread attention in Reinforcement Learning (RL) for their powerful expressiveness and multimodality. It has been verified that utilizing diffusion policies can significantly improve the performance of RL algorithms in continuous control tasks by overcoming the limitations of unimodal policies, such as Gaussian policies. Furthermore, the multimodality of diffusion policies also shows the potential of providing the agent with enhanced exploration capabilities. However, existing works mainly focus on applying diffusion policies in offline RL, while their incorporation into online RL has been less investigated. The diffusion model's training objective, known as the variational lower bound, cannot be applied directly in online RL due to the unavailability of'good' samples (actions).

diffusion policy, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.79)

Add feedback

IsBang-BangControlAllYouNeed? SolvingContinuousControlwithBernoulliPolicies

Neural Information Processing SystemsFeb-11-2026, 15:19:01 GMT

Real-world robotics tasks commonly manifest ascontrol problems overcontinuous action spaces. When learning to act in such settings, control policies are typically represented as continuous probability distributions that cover all feasible control inputs - often Gaussians. The underlying assumption is that this enables more refined decisions compared to crude policy choices such as discretized controllers, which limit the search space but induce abrupt changes. While switching controls canbeundesirable inpractice astheymaychallenge stability andaccelerate system weardown, they are theoretically feasible and even arise as optimal strategies in some settings.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Add feedback

56577889b3c1cd083b6d7b32d32f99d5-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-8-2026, 11:46:34 GMT

convergence, global convergence, numerical result, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.30)

Add feedback

A Minimum-fuel cost (MF) derivation When c(a(t)) = |a (t) |

Neural Information Processing SystemsNov-15-2025, 22:39:08 GMT

The action cost differs by row, where the corresponding optimal policy parameterization is highlighted in green.

action cost, action penalty, gaussian policy, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

Add feedback

Policy Transfer for Continuous-Time Reinforcement Learning: A (Rough) Differential Equation Approach

Guo, Xin, Lyu, Zijiu

arXiv.org Artificial IntelligenceNov-12-2025

This paper studies policy transfer, one of the well-known transfer learning techniques adopted in large language models, for two classes of continuous-time reinforcement learning problems. In the first class of continuous-time linear-quadratic systems with Shannon's entropy regularization (a.k.a. LQRs), we fully exploit the Gaussian structure of their optimal policy and the stability of their associated Riccati equations. In the second class where the system has possibly non-linear and bounded dynamics, the key technical component is the stability of diffusion SDEs which is established by invoking the rough path theory. Our work provides the first theoretical proof of policy transfer for continuous-time RL: an optimal policy learned for one RL problem can be used to initialize the search for a near-optimal policy in a closely related RL problem, while maintaining the convergence rate of the original algorithm. To illustrate the benefit of policy transfer for RL, we propose a novel policy learning algorithm for continuous-time LQRs, which achieves global linear convergence and local super-linear convergence. As a byproduct of our analysis, we derive the stability of a concrete class of continuous-time score-based diffusion models via their connection with LQRs.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2510.15165

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Education (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)

Add feedback

9713faa264b94e2bf346a1bb52587fd8-AuthorFeedback.pdf

Neural Information Processing SystemsOct-3-2025, 06:39:06 GMT

general rl problem, optimal policy, revision, (12 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.31)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.30)

Add feedback

convergence of several policy gradient methods, whose novelty is summarized in Lines 210-212 and further explained

Neural Information Processing SystemsOct-2-2025, 23:18:20 GMT

R1.1 ...these analysis mainly come from the existing work...the novelty is very limited. Our proposed SRVR-NPG has a better complexity than SRVR-PG (Remark 4.13). We believed our theoretical contrition already has archival value. R1.3 Reproducibility: We believe that all of our theoretical claims have been proved. Please refer to [34] for a detailed proof.

artificial intelligence, convergence, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.30)

Add feedback

World4RL: Diffusion World Models for Policy Refinement with Reinforcement Learning for Robotic Manipulation

Jiang, Zhennan, Liu, Kai, Qin, Yuxin, Tian, Shuai, Zheng, Yupeng, Zhou, Mingcai, Yu, Chao, Li, Haoran, Zhao, Dongbin

arXiv.org Artificial IntelligenceSep-24-2025

Robotic manipulation policies are commonly initialized through imitation learning, but their performance is limited by the scarcity and narrow coverage of expert data. Reinforcement learning can refine polices to alleviate this limitation, yet real-robot training is costly and unsafe, while training in simulators suffers from the sim-to-real gap. Recent advances in generative models have demonstrated remarkable capabilities in real-world simulation, with diffusion models in particular excelling at generation. This raises the question of how diffusion model-based world models can be combined to enhance pre-trained policies in robotic manipulation. In this work, we propose World4RL, a framework that employs diffusion-based world models as high-fidelity simulators to refine pre-trained policies entirely in imagined environments for robotic manipulation. Unlike prior works that primarily employ world models for planning, our framework enables direct end-to-end policy optimization. World4RL is designed around two principles: pre-training a diffusion world model that captures diverse dynamics on multi-task datasets and refining policies entirely within a frozen world model to avoid online real-world interactions. We further design a two-hot action encoding scheme tailored for robotic manipulation and adopt diffusion backbones to improve modeling fidelity. Extensive simulation and real-world experiments demonstrate that World4RL provides high-fidelity environment modeling and enables consistent policy refinement, yielding significantly higher success rates compared to imitation learning and other baselines. More visualization results are available at https://world4rl.github.io/.

machine learning, reinforcement learning, world model, (13 more...)

arXiv.org Artificial Intelligence

2509.1908

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment (0.34)

Technology: